The Biocurator: Connecting and Enhancing Scientific Data

نویسندگان

  • Nima Salimi
  • Randi Vita
چکیده

F rom Impressionism and Pop Art to phosphorylation sites and interacting atom pairs, the realm of curation has been expanded. The recent growth of bioinformatics, driven by exponentially growing data, advanced computing techniques, and increased funding from private and governmental organizations, has created the need for novel strategies to adequately capture, store, and analyze the multitude of data present in the scientific literature. To meet this challenge, the number and scope of scientific databases has soared in recent years, creating a new profession, the biocurator. Indeed, the present emphasis on expanding computational resources, capable of managing and analyzing complex biological data, presents an evergrowing demand for biocurators capable of interpreting the increasingly complex scientific literature and extracting relevant data in an efficient, yet consistent, manner. The Immune Epitope Database and Analysis Resource (IEDB) at http://www.immuneepitope.org [1,2] was established to capture, house, and analyze complex immune-epitope– related data extracted from the scientific literature by a team of specialized biocurators. Our experiences as IEDB biocurators are presented here to provide insight into the role of the biocurator and the challenges of literature-based curation of complex scientific data. The goal of the IEDB is to provide the scientific community with open access to concise and comprehensive immunological data and analysis resources in a previously unavailable format. The IEDB catalogues epitope sequences and structures; however, we further expand the magnitude of accessible information by including data regarding the immunological contexts in which the epitopes are defined and assayed (MHC binding, T cell, B cell, or MHC ligand elution). This affords the user the ability to generate refined queries to selectively access data of interest. To achieve this utility, our biocurators manually capture immunological data from the published literature at an unprecedented level of detail that includes data fields ranging from simple concepts such as the antigen, immunogen, and assay type to more advanced fields such as the TCR chain types, TCR residues interacting with the epitope MHC complex, and detailed information regarding carriers or vectors. Therefore, interpretation of the highly detailed and complex experimental data included in the IEDB requires a team of graduate-level biocurators with both theoretical and research experience in immunology and related fields. The IEDB currently employs eight full-time and two part-time scientists as biocurators. Although IEDB biocurator duties are diverse, their primary role is curation of data from the published literature. The initial curation of a typical manuscript requires approximately four hours, reflective of the high degree of detail that is captured from each reference (published article). While the granularity of the curated data distinguishes the IEDB as a novel resource, it also necessitates specific curation guidelines and a comprehensive review process that ensures accuracy and precision of each curation prior to its release into the public database. The IEDB biocurator plays a key role both in the formulation of these guidelines and in the review process. The nature of the data relevant to the IEDB required us to establish well-defined curation guidelines to promote consistency and to clearly delineate objective representation of the data from subjective interpretation of the data. In conjunction with a group of prominent senior immunologists, known as the Epitope Council (EC), the biocurators continuously develop the Curation Manual. This manual provides precise instructions regarding the strategies and procedures for capturing, annotating, and introducing complex and detailed data from the literature into the IEDB. The Curation Manual is used to ensure validity, standardization, and the efficiency of the curation process, and coevolves with the database as we continually encounter circumstances that require new guidelines to be established. The current IEDB Curation Manual (version 14) is publicly available through the IEDB website. Despite the use of our extensive Curation Manual, there are difficult situations that inherently arise during curation. We often encounter inconsistent terminologies in the literature that present formidable challenges to our consistent interpretation of the data. Scientists frequently use highly diverse and controversial nomenclature, for example, in the naming of MHC molecules. The methods used to perform an experiment may be somewhat obscure or contradictory. The conclusions drawn by the authors may be difficult to represent based upon the limitations of the database fields and our curation guidelines. Newly created assay types may require interpretation and assignment to a particular assay group. Thus, valuable meetings involving the curation team

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Biocurator Perspective: Annotation at the Research Collaboratory for Structural Bioinformatics Protein Data Bank

L ike most scientists, annotators at the Research Collaboratory for Structural Bioinformatics (RCSB) (http://www.pdb.org) dread the immortal cocktail party question ‘‘So, what do you do?’’ Unlike for some jobs, however, their answer can leave other scientists at the party with no response. Even within the structural biology community, our job is not well-understood. Throughout this perspective,...

متن کامل

Biocuration of functional annotation at the European nucleotide archive

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is a repository for the submission, maintenance and presentation of nucleotide sequence data and related sample and experimental information. In this article we report on ENA in 2015 regarding general activity, notable published data sets and major achievements. This is followed by a focus on sustainable biocuration of functional a...

متن کامل

Introduction to circular data

In many diverse scientific fields, the measurements are directions. For instance, a biologist may be measuring the direction of flight of a bird or the orientation of an animal. A series of such observations is called ”directional data”. Since a direction has no magnitude, these can be conveniently represented as points on the circumference of a unit circle centered at the origin or as unit ...

متن کامل

Designing an Optimal Pattern of General Medical Course Curriculum: an Effective Step in Enhancing How to Learn

Introduction: In today's world with a vast amount of information and knowledge, medical students should learn how to become effective physicians. Therefore, the competencies required for lifelong learning in the curriculum must be considered. The purpose of this study was to present a desirable general medical curriculum with emphasis on lifelong learning. Methods: The present study was Mixe...

متن کامل

A characterization of manual literature annotations by biocurators

This paper describes work in progress to characterize the manual annotations biocurators make to journal articles when curating the scientific literature using the Gene Ontology. We examined a corpus of 87 experimental journal articles from the fruitfly literature and characterized biocurators’ manual annotations by location within the articles and by annotation type. We observed a total of 5,7...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PLoS Computational Biology

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2006